Introduction

In response to a severe lack of reporting within government sources, The Washington Post compiled a database of every fatal police shooting in the United States from 2015-2022. We are interested in exploring this data, then carrying out research about a specific variable in the data set: the use of body cameras during fatal shootings.

This exploratory data analysis is divided into five main parts: first, we organize the data; second, we we reshape the data for state- and region-based comparative analyses and build four new variables (police spending, laws that mandate body camera usage, political leaning (Republican or Democrat) in 2020, number of police officers); third, we ask a SMART research question about our data and attempt to answer this question; fourth, we will continue our research by asking a modeling SMART question and attempt to answer this question.

#To skip to the modeling part of this project, please scroll to line 1053, where part 5 starts.

Part 1: Setting Up the Data

First we call our packages. Then we read the data set that comes from a csv file called FPS22.csv.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ tibble  3.1.8     ✔ purrr   0.3.4
## ✔ tidyr   1.2.1     ✔ stringr 1.4.1
## ✔ readr   2.1.3     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ plotly::filter() masks dplyr::filter(), stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

After accounting for null values, the data set we are working with has 6,574 observations. Below we have provided a single sample observation:

Name Date Manner of Death Armed Age Gender Race City
Tim Elliot 10/04/2022 Shot Gun 53 M A Shelton
State Signs of Mental Illness Threat Level Flee Body Camera Longitude Latitude Is Geocoding Exact?
WA 1 TRUE Not fleeing FALSE -123 47.2 TRUE

The total number of observations:

## [1] 5720

Part 3: Reshaping the Data for State and Regional Comparative Analysis

After pursuing the an exploratory data analysis, we decided to do some comparative analyses between states and regions to create a specific, measureable, achievable, relevant, and time-oriented research question to pursue for the remainder of the project.

To do this, we began by dividing the data into regions for easier visualization and comparative analysis. The regions divide each US state as follows:

Northwest (NW) Southwest (SW) Midwest (MW) Southeast (SE) Northeast (NE)
California New Mexico Illinois Georgia New York
Washington Arizona Wisconsin Alabama Rhode Island
Oregon Texas Indiana Mississippi Maryland
Nevada Oklahoma Michigan Louisiana Vermont
Idaho Hawaii Minnesota Tennessee Pennsylvania
Utah - Missouri North Carolina Maine
Montana - Iowa South Carolina New Hampshire
Colorado - Kansas Florida New Jersey
Wyoming - North Dakota Arkansas Connecticut
Arkansas - South Dakota West Virginia Massachusetts
Arkansas - Nebraska DC -
- - Ohio Virginia -

Fatal shootings in the Northwest United States:

## [1] 1551

Fatal shootings in the Southwest United States:

## [1] 1058

Fatal shootings in the Midwest United States:

## [1] 955

Fatal shootings in the Southeast United States:

## [1] 1668

Fatal shootings in the Northeast United States:

## [1] 488

To determine the likely causality of police body cameras turned on or off, we added several new variables. These will help us understand why there are differences between states in terms of whether body cameras are used during fatal police shootings. We will provide some sample data for each new variable.

The first new data set we add is state spending per capita on police for the year 2021.

Then we add a new binary variable that illustrates whether states have laws that mandate a police officer to use their body camera when interacting with members of the public. Not many states have this particular law, so only Maryland, New Jersey, New Mexico, and South Carolina are given the value 1, while all other states are given the value 0.

We add data on the political affiliation of a state. We used the direction the state swung in the 2020 election. That is, either a state voted from President Trump or for President Biden. Negative values indicate a swing towards Trump and positive values indicate a swing towards Biden.

Finally, we add a variable that looks at a state’s quantity of police officers per 100K citizens.

Example data points are shown below:

State Variable Value
Alabama Police Spending Per Capita USD$477
Maryland Police Body Camera Laws 1
Iowa 2020 US Presidential Election Vote -8
Florida Police Quantity Per 100k Citizens USD$477

We also created two sub-data sets by grouping the data by state and by region for visualization purposes. The contents of both groups are identical, besides their grouping.

Part 4: Research SMART Question and Answer

Within our data set of 5,720 observations of police shootings from 2015 to 2022 in the United States, is there a correlation between the U.S. state of observation and whether a body camera was turned on during the shooting?

The state data subgroup can be summarized as follows:

##     state              month               year           regions 
##  Length:1763        Length:1763        Length:1763        MW:285  
##  Class :character   Class :character   Class :character   NE:155  
##  Mode  :character   Mode  :character   Mode  :character   NW:503  
##                                                           SE:478  
##                                                           SW:342  
##                                                                   
##     spendpc         bclaw          marg2020      le_per_100k      stbcp      
##  Min.   : 390   Min.   :0.000   Min.   :-43.0   Min.   :284   Min.   :0.000  
##  1st Qu.: 547   1st Qu.:0.000   1st Qu.: -8.0   1st Qu.:378   1st Qu.:0.051  
##  Median : 633   Median :0.000   Median :  0.3   Median :438   Median :0.125  
##  Mean   : 665   Mean   :0.019   Mean   :  2.7   Mean   :442   Mean   :0.113  
##  3rd Qu.: 756   3rd Qu.:0.000   3rd Qu.: 19.0   3rd Qu.:479   3rd Qu.:0.146  
##  Max.   :1337   Max.   :1.000   Max.   : 87.0   Max.   :722   Max.   :1.000  
##      gen.p           smi.p           flee.p      att.p          armed.p     
##  Min.   :0.852   Min.   :0.000   Min.   :0   Min.   :0.333   Min.   :0.500  
##  1st Qu.:0.941   1st Qu.:0.224   1st Qu.:0   1st Qu.:0.583   1st Qu.:0.877  
##  Median :0.960   Median :0.255   Median :0   Median :0.631   Median :0.917  
##  Mean   :0.955   Mean   :0.264   Mean   :0   Mean   :0.662   Mean   :0.915  
##  3rd Qu.:0.973   3rd Qu.:0.286   3rd Qu.:0   3rd Qu.:0.750   3rd Qu.:0.949  
##  Max.   :1.000   Max.   :1.000   Max.   :0   Max.   :1.000   Max.   :1.000  
##      MoD.p          age.avg     Non_White_prop 
##  Min.   :0.667   Min.   :30.1   Min.   :0.000  
##  1st Qu.:0.914   1st Qu.:34.4   1st Qu.:0.364  
##  Median :0.928   Median :35.3   Median :0.529  
##  Mean   :0.936   Mean   :36.2   Mean   :0.479  
##  3rd Qu.:0.970   3rd Qu.:37.7   3rd Qu.:0.583  
##  Max.   :1.000   Max.   :53.7   Max.   :1.000

The region data subgroup can be summarized as follows:

##     state              month               year              spendpc    
##  Length:1763        Length:1763        Length:1763        Min.   : 390  
##  Class :character   Class :character   Class :character   1st Qu.: 547  
##  Mode  :character   Mode  :character   Mode  :character   Median : 633  
##                                                           Mean   : 665  
##                                                           3rd Qu.: 756  
##                                                           Max.   :1337  
##      bclaw          marg2020      le_per_100k      stbcp           gen.p      
##  Min.   :0.000   Min.   :-43.0   Min.   :284   Min.   :0.000   Min.   :0.852  
##  1st Qu.:0.000   1st Qu.: -8.0   1st Qu.:378   1st Qu.:0.051   1st Qu.:0.941  
##  Median :0.000   Median :  0.3   Median :438   Median :0.125   Median :0.960  
##  Mean   :0.019   Mean   :  2.7   Mean   :442   Mean   :0.113   Mean   :0.955  
##  3rd Qu.:0.000   3rd Qu.: 19.0   3rd Qu.:479   3rd Qu.:0.146   3rd Qu.:0.973  
##  Max.   :1.000   Max.   : 87.0   Max.   :722   Max.   :1.000   Max.   :1.000  
##      smi.p           flee.p      att.p          armed.p          MoD.p      
##  Min.   :0.000   Min.   :0   Min.   :0.333   Min.   :0.500   Min.   :0.667  
##  1st Qu.:0.224   1st Qu.:0   1st Qu.:0.583   1st Qu.:0.877   1st Qu.:0.914  
##  Median :0.255   Median :0   Median :0.631   Median :0.917   Median :0.928  
##  Mean   :0.264   Mean   :0   Mean   :0.662   Mean   :0.915   Mean   :0.936  
##  3rd Qu.:0.286   3rd Qu.:0   3rd Qu.:0.750   3rd Qu.:0.949   3rd Qu.:0.970  
##  Max.   :1.000   Max.   :0   Max.   :1.000   Max.   :1.000   Max.   :1.000  
##     age.avg     Non_White_prop 
##  Min.   :30.1   Min.   :0.000  
##  1st Qu.:34.4   1st Qu.:0.364  
##  Median :35.3   Median :0.529  
##  Mean   :36.2   Mean   :0.479  
##  3rd Qu.:37.7   3rd Qu.:0.583  
##  Max.   :53.7   Max.   :1.000

Figure 1: Normality

We will now check our data for normality:

Because the plot is relatively linear, we can conclude this data is close enough to normality for our purpose.

Figure 2: Body Camera Usage by Region

Now let us look at the body camera proportions by state. In the below bar graph, TRUE signifies a police body camera that was on, while FALSE indicates the body camera was off:

Number of fatal shootings where the body camera was on:

##   body_camera   n
## 1        TRUE 905

Number of fatal shootings where the body camera was off:

##   body_camera    n
## 1       FALSE 5383

Figure 3: stbcp by Region

This scatter plot shows the proportion of fatal shootings when cameras were on by state (the variable stbcp). Each point on the graph depicts a state’s proportion of shootings where the police body camera was turned on during the incident). We can see that there is very little variation in Southwest, and many differences among states in the Midwest.

Finally, let us check out the mean body camera on proportion for all states:

## [1] 0.113

And the stbcp median body camera on proportion for all states:

## [1] 0.125

Model 1: Chi-square Test

We will now perform a chi-square test to see if there is a significant difference between the proportions of each state.

Null: There is no significant differences between US States in the proportion of body cameras being turned on during police shootings

Alternative: There is a significant difference between US State in the proportion of body cameras being turned on during police shootings

Significance Level: a = 0.05

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.051   0.125   0.113   0.146   1.000
## 
##  Pearson's Chi-squared test
## 
## data:  contable
## X-squared = 66994, df = 1900, p-value <2e-16

With a p-value of 2e-16, we easily pass our significance level of alpha=0.05 and have shown that there exists significant differences between different states’ proportions of body camera usage during fatal police shootings.

Part 5: Investigating Causality

We now know there are significant differences in the level of body camera usage during police shootings among US states. Let us see if we can find out what drives those differences.

Our second SMART question: For the years 2021 and 2022, what variables influence a state’s proportion of body cameras turned on during fatal police shootings?

The variables we will study include:

  1. US region

  2. Law enforcement officers per 100,000 citizens

  3. Law enforcement spending per capita

  4. Body camera mandate laws

  5. 2020 presidential election voting

We will use multiple linear regression to build models that investigate whether any of these variables can be useful predictors of body cameras being turned on during fatal shootings in the United States.

Because most states’ body camera laws were enacted at the start of 2021, we will only look at data from 2021 and 2022. This reduces the number of cases in our original data set to 1,763.

First let us take a look at the new data set with its new variables (added in Part 3):

## # A tibble: 6 × 17
## # Groups:   state [6]
##   state month year  regions spendpc bclaw marg2020 le_per_1…¹  stbcp gen.p smi.p
##   <chr> <chr> <chr> <fct>     <dbl> <dbl>    <dbl>      <dbl>  <dbl> <dbl> <dbl>
## 1 WA    10    2022  NW          608     0       19       320. 0.0556 0.917 0.5  
## 2 OR    10    2022  NW          736     0       16       284. 0.04   0.96  0.36 
## 3 KS    10    2022  MW          553     0      -15       467. 0.125  0.938 0.188
## 4 CA    10    2022  NW          981     0       29       378. 0.146  0.944 0.255
## 5 CO    10    2022  NW          664     0       14       417. 0.0727 0.982 0.164
## 6 OK    10    2022  SW          487     0      -33       410. 0.128  1     0.277
## # … with 6 more variables: flee.p <dbl>, att.p <dbl>, armed.p <dbl>,
## #   MoD.p <dbl>, age.avg <dbl>, Non_White_prop <dbl>, and abbreviated variable
## #   name ¹​le_per_100k

Number of observations:

## [1] 1763

Figure 4: Body Camera Laws

The following figure depicts the relationship between body camera laws and stbcp.

Figure 5: Law Enforcement Officers Per 100,000 Citizens

The following figure depicts the relationship between law enforcement officers per 100K citizens and stbcp.

Figure 6: 2020 Election Voting Margin

This figure depicts the relationship between the 2020 US presidential election margin and stbcp.

Figure 7: Spending on Policing Per Capita

This figure depicts the relationship between the state spending on policing per capita and stbcp.

## Warning: Using size for a discrete variable is not advised.

Figure 8: Law Enforcement Officers Per 100,000 Citizens

This figure shows law enforcement officers per 100K citizens, grouped by region.

## Warning: Using size for a discrete variable is not advised.

Figure 9: 2020 Election Voting Margin

The following plot compares the following variables: stbcp, 2020 election margin, and region.

Figure 10: Law Enforcement Officers Per 100,000 Citizens

Finally, here is a plot showing law enforcement officers per 100K citizens, police spending, and region.

Now that we are familiar with the data, we can start to model with our new state-wide data.

Model 2: A simple MLRG model that uses all the new variables along with the region variable

## 
## Call:
## lm(formula = stbcp ~ (marg2020 + bclaw + regions + le_per_100k + 
##     spendpc), data = FD)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.1649 -0.0532  0.0032  0.0318  0.9610 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.05e-01   1.53e-02    6.84  1.1e-11 ***
## marg2020    -6.23e-04   1.66e-04   -3.76  0.00018 ***
## bclaw       -3.43e-02   1.44e-02   -2.38  0.01734 *  
## regionsNE   -4.59e-02   9.28e-03   -4.94  8.4e-07 ***
## regionsNW    3.64e-02   7.95e-03    4.58  4.9e-06 ***
## regionsSE    2.40e-02   6.44e-03    3.73  0.00020 ***
## regionsSW    5.96e-03   6.75e-03    0.88  0.37738    
## le_per_100k -6.46e-05   3.67e-05   -1.76  0.07900 .  
## spendpc      3.74e-05   2.01e-05    1.86  0.06262 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0787 on 1754 degrees of freedom
## Multiple R-squared:  0.114,  Adjusted R-squared:  0.11 
## F-statistic: 28.3 on 8 and 1754 DF,  p-value: <2e-16
##             GVIF Df GVIF^(1/(2*Df))
## marg2020    2.99  1            1.73
## bclaw       1.09  1            1.04
## regions     5.43  4            1.24
## le_per_100k 2.19  1            1.48
## spendpc     3.97  1            1.99
##        res
## 1 -0.07591
## 2 -0.10046
## 3  0.02029
## 4  0.01029
## 5 -0.05770
## 6  0.00461
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The VIF values for model 1 are all within acceptable range.

With an R^2 of 0.183, this model is not very good at predicting statewide body camera usage. We can see that the region variable is not helpful so we will remove it.

Model 3: Uses only the most helpful predictors from the previous model

## 
## Call:
## lm(formula = stbcp ~ (marg2020 + bclaw + spendpc + le_per_100k), 
##     data = FD)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.1722 -0.0554  0.0078  0.0320  0.9106 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.29e-01   1.51e-02    8.54  < 2e-16 ***
## marg2020    -1.26e-03   1.53e-04   -8.21  4.3e-16 ***
## bclaw       -1.76e-02   1.44e-02   -1.22     0.22    
## spendpc      1.17e-04   1.62e-05    7.25  6.3e-13 ***
## le_per_100k -2.05e-04   2.57e-05   -7.97  2.7e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0807 on 1758 degrees of freedom
## Multiple R-squared:  0.0653, Adjusted R-squared:  0.0632 
## F-statistic: 30.7 on 4 and 1758 DF,  p-value: <2e-16
##    marg2020       bclaw     spendpc le_per_100k 
##        2.42        1.03        2.45        1.02
##        res
## 1 -0.05538
## 2 -0.09714
## 3  0.00778
## 4  0.01558
## 5 -0.03122
## 6 -0.01597
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The VIF values for model 2 are all within acceptable range.

With an R^2 of 0.152, this model is even worse at predicting statewide body camera usage.

Model 4: Interaction of law enforcement spending per capita on quantity of officers per capita

## 
## Call:
## lm(formula = stbcp ~ (marg2020 + bclaw + regions + le_per_100k + 
##     spendpc + I(spendpc * le_per_100k)), data = FD)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.1631 -0.0504  0.0022  0.0321  0.9585 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               1.59e-01   4.23e-02    3.77  0.00017 ***
## marg2020                 -5.63e-04   1.71e-04   -3.29  0.00104 ** 
## bclaw                    -3.62e-02   1.45e-02   -2.50  0.01250 *  
## regionsNE                -4.72e-02   9.32e-03   -5.06  4.7e-07 ***
## regionsNW                 3.88e-02   8.12e-03    4.77  2.0e-06 ***
## regionsSE                 2.57e-02   6.54e-03    3.92  9.2e-05 ***
## regionsSW                 7.81e-03   6.88e-03    1.14  0.25610    
## le_per_100k              -1.83e-04   9.28e-05   -1.97  0.04915 *  
## spendpc                  -4.01e-05   5.94e-05   -0.68  0.49955    
## I(spendpc * le_per_100k)  1.63e-07   1.17e-07    1.39  0.16579    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0786 on 1753 degrees of freedom
## Multiple R-squared:  0.115,  Adjusted R-squared:  0.111 
## F-statistic: 25.4 on 9 and 1753 DF,  p-value: <2e-16
##        res
## 1 -0.08069
## 2 -0.10175
## 3  0.02266
## 4  0.01201
## 5 -0.05976
## 6  0.00391
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We are ignoring the VIF test for multicolinearity because we are using an interaction predictor.

With an R^2 of 0.276, this model is not good, much better than the others at predicting statewide body camera usage.

Model 5: Testing lm3

Since lm3 is our best model (per our R^2), lets try to predict a few made up US states:

Please notice Eleum and Faraam are identical except their body camera laws. This is the same for GW and HW.

Now let’s plug these new “states” into our model:

##      fit    lwr   upr
## 1 0.0749 0.0439 0.106
##     fit   lwr   upr
## 1 0.133 0.113 0.152
##       fit    lwr    upr
## 1 0.00019 -0.032 0.0324
##     fit   lwr   upr
## 1 0.144 0.104 0.184
##     fit   lwr   upr
## 1 0.146 0.132 0.161
##    fit    lwr   upr
## 1 0.11 0.0784 0.142
##     fit    lwr   upr
## 1 0.106 0.0936 0.118
##      fit    lwr    upr
## 1 0.0697 0.0407 0.0986

We can see the difference of fit on states Eleum and Faraam, as well as GW and HW.

Part 6: Conclusion

Studying the use of body cameras in police work is an important topic of study for data-driven policy research in the United States. While we hoped to be able to associate this correlation between the U.S. state of observation and whether the body camera was on or off during the shooting to state policy on body cameras to some variable, we were unable to find a strong correlation. Although lm3 was our best model, it is still not a great predictor of statewide body camera usage, which can lead us to the following conclusions:

  1. Both regional and state groupings demonstrated quantifiable differences in the proportion of body cameras turned on or off during fatal police shootings.

  2. The number of law enforcement officers per capita does not influence whether a body camera is turned on or off.

  3. State spending on policing per capita does not influence whether a body camera is turned on or off.

  4. The political affiliation of a states does not influence whether a body camera is turned on or off.

  5. Body camera mandate laws, present in Maryland, New Jersey, New Mexico, and South Carolina, slightly influence whether a body camera is turned on or off.

Considering body camera laws are relatively nascent, it will be an interesting topic of study to evaluate changes in stbcp as more states adopt such laws. This research project has shown that they may be the best chance states have of increasing camera usage during active police duty.